Prediction of metabolic syndrome and its associated risk factors in patients with chronic kidney disease using machine learning techniques

Abstract Introduction: Chronic kidney disease (CKD) and metabolic syndrome (MS) are recognized as public health problems which are related to overweight and cardiometabolic factors. The aim of this study was to develop a model to predict MS in people with CKD. Methods: This was a prospective cross-sectional study of patients from a reference center in São Luís, MA, Brazil. The sample included adult volunteers classified according to the presence of mild or severe CKD. For MS tracking, the k-nearest neighbors (KNN) classifier algorithm was used with the following inputs: gender, smoking, neck circumference, and waist-to-hip ratio. Results were considered significant at p < 0.05. Results: A total of 196 adult patients were evaluated with a mean age of 44.73 years, 71.9% female, 69.4% overweight, and 12.24% with CKD. Of the latter, 45.8% had MS, the majority had up to 3 altered metabolic components, and the group with CKD showed statistical significance in: waist circumference, systolic blood pressure, diastolic blood pressure, and fasting blood glucose. The KNN algorithm proved to be a good predictor for MS screening with 79% accuracy and sensitivity and 80% specificity (area under the ROC curve – AUC = 0.79). Conclusion: The KNN algorithm can be used as a low-cost screening method to evaluate the presence of MS in people with CKD.


Resumo
Introduction: Chronic kidney disease (CKD) and metabolic syndrome (MS) are recognized as public health problems which are related to overweight and cardiometabolic factors.The aim of this study was to develop a model to predict MS in people with CKD.Methods: This was a prospective cross-sectional study of patients from a reference center in São Luís, MA, Brazil.The sample included adult volunteers classified according to the presence of mild or severe CKD.For MS tracking, the k-nearest neighbors (KNN) classifier algorithm was used with the following inputs: gender, smoking, neck circumference, and waist-to-hip ratio.Results were considered significant at p < 0.05.Results: A total of 196 adult patients were evaluated with a mean age of 44.73 years, 71.9% female, 69.4% overweight, and 12.24% with CKD.Of the latter, 45.8% had MS, the majority had up to 3 altered metabolic components, and the group with CKD showed statistical significance in: waist circumference, systolic blood pressure, diastolic blood pressure, and fasting blood glucose.The KNN algorithm proved to be a good predictor for MS screening with 79% accuracy and sensitivity and 80% specificity (area under the ROC curve -AUC = 0.79).

Conclusion:
The KNN algorithm can be used as a low-cost screening method to evaluate the presence of MS in people with CKD.

IntRoductIon
Chronic non-communicable diseases (CNCD) are currently recognized as one of the major public health problems 1 .The World Health Organization (WHO) estimates that CNCDs are responsible for 71% of the 57 million deaths worldwide 2 .In Brazil, CNCDs are responsible for 76.4% of all deaths, with a focus on diseases of the circulatory system (28% of deaths), cancer (18%), diabetes mellitus (5%), and respiratory diseases (6%) 3 .
Among the CNCD, chronic kidney disease (CKD), which is characterized by altered renal function, stands out.It is defined as an abnormality in renal structure or function that has been present for more than three months and has health implications.These abnormalities can be represented by a decreased glomerular filtration rate (GFR) <60 ml/min/1.73m or the presence of one or more markers of kidney injury 4,5 .
The prevalence of CKD is still unknown in many countries 6 .However, it has been increasing, mainly as a result of the increasing incidence of obesity, diabetes, and hypertension.In addition, renal function is highly susceptible to age-related changes, with a significantly higher incidence in middle-aged and elderly patients 7,8 .
People with CKD tend to have cardiovascular disease (CVD) 8 .CVD is the leading cause of death in patients with chronic kidney disease and is associated with accelerated progression of CKD.These findings support the view that the presence of cardiometabolic risk factors (CRF) and impaired kidney function may increase kidney disease-related risks 9 .
In addition, metabolic syndrome (MS) is considered to be a grouping of interrelated risk factors that doubles the risk of CVD in 5 to 10 years 10 .This pathology is described as a set of the CRF, which are usually related to the development of insulin resistance and fat accumulation.These risk factors include arterial hypertension, hypertriglyceridemia, dyslipidemia, hyperglycemia and central obesity 11 .
Individuals with NCD generate great financial costs to the public health system as they require treatment to control these diseases, especially in MS and CKD.Patients with end-stage CKD, stage G5 (GFR <15 mL/min/1.73m 2 ) have severe renal failure leading to complete loss of renal function.At this stage, the therapeutic options are renal replacement therapies (RRT), such as artificial blood purification methods (peritoneal dialysis or hemodialysis) or kidney transplantation 4 .In Brazil, RRT is considered the main treatment and also requires higher costs for health services 12 .Therefore, early detection of such pathologies can delay complications and support the use of appropriate interventions, such as screening tests in high-risk groups 13 .
In this case, some data analysis techniques appear to be good solutions that provide more accurate predictions about the individual's health 14 .Therefore, the use of machine learning (ML) techniques appears to be an instrument to help develop and improve new methods for diagnosis and/or screening 15,16 .
Therefore, the importance of the study for the prediction of MS in the population with CKD is clear, since this condition leads to more advanced stages and a higher risk of death from cardiovascular events in this population.Therefore, even though patients with CKD have few risk factors for MS, preventive measures must be taken to avoid problems and negative outcomes such as early death.
Thus, given the magnitude of MS and CKD and the complications related to them, efforts should therefore be made to enable studies aimed at early diagnosis of these pathologies.In addition, this study aims to develop a model to predict the risk of MS and associated risk factors in people with CKD.

methods
This was a prospective cross-sectional study of patients of both sexes from the Nephrology Reference Center in São Luís, MA, Brazil, between January 2018 and July 2020.The sample consisted of 196 volunteers classified according to their health status.CKD was determined by a glomerular filtration rate <60 mL/min 4 , for mild CKD (above this value) or severe CKD.GFR was determined by measuring serum creatinine To calculate the estimated glomerular filtration rate, the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation was used 17 .
The suspected diagnosis of MS was defined as proposed by the International Diabetes Federation (IDF) 18 by considering the presence of changes in waist circumference (≥ 90 cm for men and ≥ 80 cm for women), as a mandatory factor in addition to two other altered components.These components may be: triglycerides ≥ 150 mg/dL or treatment for dyslipidemia; HDL cholesterol < 40 mg/dL for men and < 50 mg/dL for women or treatment for dyslipidemia; systolic blood pressure (SBP) ≥ 130 mmHg, diastolic blood pressure (DBP) ≥ 85 mmHg, or use of antihypertensive medication; and fasting blood glucose ≥ 100 mg/dL or previous diagnosis of diabetes mellitus 19 .
Data collection was carried out in two stages.First, volunteers were recruited at the Nephrology Center.Those who agreed to participate in the study voluntarily signed the Informed Consent Form (ICF).Soon after, they were interviewed using a semi-structured questionnaire, followed by an anthropometric and hemodynamic assessment.The laboratory tests were then scheduled for the following day.
Anthropometric, biochemical, hemodynamic, and lifestyle data were evaluated.The semistructured questionnaire took into account sociodemographic characteristics, lifestyle, and self-reported personal history, such as hypertension and diabetes mellitus.Anthropometric variables were performed in duplicate, and the means were used for data analysis.All variables were measured according to protocols already consolidated in the literature 20 .These variables were: weight, height, arm circumference (AC), waist circumference (WC), hip circumference (HC), neck circumference (NC), and calf circumference (CC).
Anthropometric indices: waist-to-hip ratio (WHR), waist-to-height ratio (WHtR), and body mass index (BMI) were used to determine nutritional status and were based on WHO cutoff points.WC was estimated considering the cutoff points for the South American population, with values of ≥90 cm for men and ≥80 cm for women 19 .
Only the k-nearest neighbors (KNN) classifier algorithm, a supervised ML method, was used for MS tracking.It is easy to implement, adaptable, and easy to program.All these advantages had a positive impact on our choice, as KNN alone met our demands 21 .
KNN uses the closest data and performs a segmentation of the closest results based on the selected metric by considering a limited margin of error.In this algorithm, the dataset is prepared by removing missing values and normalizing features, which is known as the pre-processing phase.The data is randomly divided into two different sets: the training set and the test set.This technique ensures an adequate representation of the patterns for training and a robust performance evaluation 21 .
Therefore, the database was divided into two groups: 80% of the sample was for training and 20% for testing.Based on the patient's clinical information (gender, smoking, WHR, and CC), the classifier algorithm was able to distinguish individuals with and without MS.The classification method was constructed using MATLAB ® software version R2021a (MathWorks Inc, Natick, MD, USA).
For the data file and the statistical analysis, the SPSS ® software version 25 (SPSS Inc., Chicago, IL, USA) was used.The Kolmogorov-Smirnov test was used to analyze the normality of the data.The variables that were considered normally distributed were analyzed using the Student's t-test.The others were analyzed with the Mann-Whitney U test, and the Receiver Operating Characteristics (ROC) curve was used for the evaluation of the algorithm classifier.In addition, the results were considered statistically significant at p < 0.05.All statistical analysis values are described in the tables in the Results section.
This study is part of a larger project ("umbrella" or "root"), entitled "Prediction of Chronic Kidney Disease Using Artificial Neural Networks".It was also approved by the Ethics and Research Committee of the Federal University of Maranhão, according to the CAAE opinion number 67030517.5.0000.5087.In addition, all participants signed the informed consent form.It is worth mentioning that this work uses original population data generated exclusively for this research, in addition to providing new data for the umbrella project.Furthermore, it is worth mentioning that both neural networks and the KNN are artificial intelligence computational models that work with data processing for classification and prediction.
Table 3 shows the general characteristics of the sample, which was stratified according to the presence of mild and severe CKD.In this Table, the severe CKD patients had significantly higher values of systolic and diastolic blood pressure, age, CP, fasting blood glucose, and urea (p < 0.05) than those with mild CKD.Similarly, the obesity indicator variables (WC, BMI, WHR, WHtR) also showed a higher prevalence in the severe CKD group.
For clinical, epidemiological, didactic, and conceptual purposes, CKD is classified into six functional stages according to the patient's degree of renal function based on glomerular filtration rate, ranging from normal/high condition to dialysis or transplantation.The Kidney Disease: Improving Global Outcomes (KDIGO) guides the estimation of GFR from serum creatinine as the best method to diagnose, classify and monitor progression of CKD 4 .GFR was categorized from G1 to G5 as shown in Table 4.
Regarding the staging and classification of CKD, Table 4 shows that a large part of the sample is in the initial stages (G1 and G2), all from the mild CKD group.In contrast, only 1.4% (n = 4) is in the endstage and requires some type of renal replacement therapy.
Table 5 shows the prevalence values of risk factors for metabolic syndrome stratified by the presence of mild or severe CKD.The analysis of clinical data showed that, in percentage values, all altered components of MS were found to be prevalent in the severe CKD group, with the exception of total cholesterol (TC).Of the investigated sample, 45.8% of patients with severe CKD had MS, and the majority had up to 3 altered metabolic components.
When we look at the table stratified by the presence of CKD (Table 3), the group of individuals with kidney problems had higher mean/median values in all parameterscompared to healthy individuals, with statistical significance in WC, systolic blood pressure, diastolic blood pressure and fasting blood glucose.
Regarding the software developed through ML, the implemented algorithm was the KNN with the following entries: gender, smoking, WHR, and   NC.The development of the algorithm was labeled based on the MS components according to the IDF criteria 18,19 .
There is no fixed ratio that works in all scenarios in the KNN algorithm.The database was divided into two groups: 80% of the sample was allocated for training and 20% for testing.This was done taking into account the fact that, with very large datasets, it may be feasible to use ratios such as 70/30 or even 90/10 22 .All these divisions were tested, and the one that showed the best response was 80/20, as it is a medium sized database.The KNN had 79% accuracy and sensitivity and a specificity of 80%.
Figure 1 shows the graphical representation of the ROC curve (area under the ROC curve -AUC = 0.79) generated by plotting the sensitivity (true positive rate) on the y-axis against the specificity (false positive rate) on the x-axis.Thus, the KNN proved to be a good classifier for predicting MS.For a diagnostic test to be considered accurate, a curve in the upper left triangle must be above the reference line.The closer this curve is to this corner, i.e. closer to 1, the better the method's performance 23,24 .
In this first stage of the study, we decided to use these anthropometric input parameters as they are inexpensive, easy to use and already recommended   in the literature.However, it is believed that using more input variables can increase the specificity of the classifier algorithm.The software is simple and easy to use.It has four fields for entering the patient's clinical data and three buttons.These buttons are labeled as follows: calculate (provides the patient classification after analyzing the clinical data), clear (deletes the entered data on the screen), and close (terminates the operation of the software).The conclusion from the analysis of the algorithm is presented in the "Result" field and described as follows: high or normal for predicting MS risk in CKD patients, as shown in Figure 2. dIscussIon Corroborating our results, several studies reported a higher incidence of MS in women with severe CKD and a higher prevalence in the elderly [19][20][21][22][23][24][25] .Age is considered a risk factor for both MS and CKD.In this study, the high prevalence of MS in elderly people can be described by functional limitations, an increasingly sedentary lifestyle, and reduced physical activity, as described in other reports 26 .
The results of our study reveal that abdominal obesity, altered blood pressure, and high blood glucose were associated with CKD.In addition, the main underlying diseases in the study population were SAH and DM.These findings confirm the results of studies in which central obesity was associated with CKD, regardless of general obesity and increased BMI 25,27,28 .In MS, abdominal obesity is one of the main components responsible for insulin resistance, which in turn can lead to progressive loss of renal function 29 .This occurs because obesity directly affects hemodynamics and renal structure, as substantial evidence shows 30 .
Among CNCD, SAH and DM were reported as the main underlying diseases in our study.These pathologies are among the most prevalent risk factors for the development of CKD and are responsible for most cases.In Brazil, according to the Census of Hemodialysis Centers, hypertensive nephropathy (34%) and diabetes (31%) are the main underlying diseases in patients undergoing hemodialysis in 2019 31 .
In this study, MS (according to the IDF definition) is associated with an increased risk of CKD 18 .These findings call for greater attention to policies and interventions, such as lifestyle changes, that should aim to reduce the prevalence of MS and its adverse outcomes.The literature suggests that efforts to raise awareness of prevention strategies should start early, when any of the constituent components of MS is present 19 .
Thus, public health strategies are important for the prevention of CNCD in general.To this end, the Strategic Action Plan for Confronting CNCDs 2011-2022 was launched to promote the development of public policies.Itaimed at the prevention and control of CNCDs and their grievances in order to reduce the premature mortality rates (30 to 69 years) by 2% per year and reduce the prevalence of their risk factors 32 .
In this study, MS had a higher prevalence in the severe CKD group.It is already known that MS negatively impacts the progression of CKD.Therefore, MS and its associated factors need to be identified early.However, a disadvantage in MS evaluation is the use of invasive variables that are included in all criteria for MS diagnosis 18,33 .
In developing countries, as in Brazil, the population's difficulty in accessing primary health care services, specialized consultations, and complementary exams contribute to the underreporting of CNCD, including MS and CKD.Therefore, the development of complementary, cost-effective and easy-to-use diagnostic methods is needed to facilitate patient access to early diagnosis [14][15][16] .
In this sense, the use of additive manufacturing techniques appears to be an instrument to assist in the development of new low-cost screening methods without the use of invasive variables 34 .Therefore, one of the proposals of this study was the development of a software with these characteristics, such as the KNN algorithm, which can be used in the evaluation of CDK patients with MS.
The KNN method performs binary classification by not only giving the outputs, when there is pathology and when there is no pathology, but also performing the expected classification based on the evaluation of the collected data.Several studies use the KNN method to classify groups for application to biological data, with satisfactory results, indicating that it is an efficient method for the present work 21,34 .
With the demand for methods that can facilitate diagnoses and optimize the care of healthcare professionals, several studies have been developed to address the application of ML in healthcare.These include the following: prediction of cardiovascular disease over a 10-year period 35 ; predictive models for undiagnosed DM 36 ; and Parkinson's disease diagnosis from patient writing, by using image processing techniques 37 .
In the present study, based on the evaluation metrics of the chosen classifier and the area under the ROC curve (AUC = 0.79), KNN was found to be a good predictor of MS tracking in clinical practice.In Rosa's study evaluating the prediction of metabolic syndrome in antiretroviral users, the KNN algorithm also proved to be a good predictor by providing AUC = 0.78, a result similar to ours 38 .
This result can be attributed to input parameters such as NC and WHR, for example.This statement is contradicted by several findings in the literature, which report the importance of anthropometric variables for clinical practice by associating NC 37,39 and WHR 40,41 as good predictors of MS.
Similarly, smoking is an established indicator in the literature for cardiometabolic risk analysis 42,43 .Smoking remains the leading cause of preventable death worldwide and a crucial factor for the development of CNCDs such as cancer, cardiovascular, and pulmonary diseases.Gender, in turn, has been associated with MS and CKD, with higher prevalence among women 43,44 .
The advantages of the KNN are: its simplicity of implementation, its very effective performance in different situations and areas (engineering, health, education, among others), its ease of interpretation, and its ideal suitability for small or medium-sized databases.Furthermore, it directly produces the decision rule without estimating the densities that are conditioned on the classes, making it a good choice for classification problems where closely related patterns in the feature space may belong to the same class 21 .
Thus, the KNN algorithm proved to be an attractive tool that can be implemented in environments with few resources.It can also support the clinical management of MS in the general population, especially in the CKD population, given the association between the number of MS components and the progression of CKD.In addition, it can have a positive impact on the quality of life of this population.
However, some limitations were found throughout the course of the study.One of the initial limitations was due to the COVID-19 pandemic, which made data collection difficult since the study population was considered at risk for the coronavirus.The number of variables can be seen as another limiting factor, because it is believed that using more input variables can increase the specificity of the classifier algorithm.Another additional limitation can be the inclusion of competitive risks, where patients may suffer from other types of events that prevent the observation of the event under study.Furthermore, as this was a cross-sectional study, it was not possible to continue monitoring patients over time, and it was not possible to establish a cause-effect relationship.
This contributed to a limitation of the sample, as the results found should only be considered for the population in question.Thus, as a recommendation for future investigations in this topic, we emphasize the importance of developing the classifier algorithm at this time.For this, a more robust sample can be used by expanding it to other populations with other associated comorbidities, by increasing the number of input variables, and by developing a longitudinal study.Also, a validation study of the implemented software, an incorporation of additional statistical tests, and a comparison of different algorithms can be provided, with or without the same proposed methods.

conclusIon
In summary, there was a significant prevalence of MS in the CKD population, demonstrating the importance of early detection of this syndrome.This occurs because of the fact that MS involves a series of cardiometabolic factors, and the more components present, the greater the risk of CKD progression.The KNN classifier algorithm can be used as a screening method with high sensitivity and low cost.In addition, it can be used to screen for MS in primary healthcare units and low-resource settings by contributing to the early detection of MS in the general population, especially in CKD.The use of the classifier can help health professionals in decision making by contributing to preventive healthcare, especially in CKD, to reduce treatment costs and even avoid negative outcomes and early death.
Considering its important usefulness for the public health field, another possible area of future research could be to carry out larger studies with the KNN classifier and its variants.Also, other lowcost parameters could be used to predict the risk of harmful clinical conditions in groups with and no risk of chronic non-communicable diseases.

Figure 1 .
Figure 1.Area under the ROC curve is demonstrating the discriminatory power of the k-nearest neighbors' algorithm in predicting metabolic syndrome in the test set.

Figure 2 .
Figure 2. User interface layout (A) and example with input parameters (B).

tAble 3
anthrOpOmetric, hemOdynamic, and labOratOry characteriSticS Of the Sample, Stratified by the preSence Of chrOnic kidney diSeaSe(CKD) Abbreviation -GFR: glomerular filtration rate.Notes -* values are described in frequency (n) and percentage (%).

tAble 4
Staging and claSSificatiOn Of chrOnic kidney diSeaSe (ckd) in the Sample(n = 196)

tAble 5 prevalence
Of metabOlic SyndrOme and itS cOmpOnentS in the Sample